1 Introduction

2 Description of the dataset

The dataset for this investigation covers all Airbnb offerings in London as per the 4th and 5th of March 2017. It contains 53.904 objects for 95 different variables. Its source is the website “Inside Airbnb - Adding data to the debate” (Cox, 2017). This in an independent and non-commercial project aimed to examine the effect of Airbnb activities on urban development.

Prior to using the dataset for this project, some adjustments and additions had to be made. The operations conducted and the resulting data can be grouped as follows:

  • Price: Reflects the price per night as offered on Airbnb.
  • Rent per Zipcode: The initial dataset holds no information on the regular rent price at the location of an Airbnb. Rent prices however, can be considered as the major cost of providing an Airbnb. Therefore, we used the first half of the Zipcode to apply the average rent for a single bedroom flat to every issuing. The rent information was drawn from SOURCE.
  • Distance / Location: Another important aspect of an Airbnb is its location in terms of distance to the city centre. Considering Picadilly Circus (Longitude: -0.133869, Latitude: 51.510067) as the touristic centre of London, we calculated the linear distance for every Airbnb using the Haversine formula. Furthermore, the Cartesian coordinates were calculated, using the instructions of Irawan (2014), to plot the data on maps provided by Lovelace & Cheshire (2014).
  • Reviews: Reflect the average customer reviews from Airbnb and the total number of reviews.
  • Property Characteristics: Some general information on the property such as the room type, the number of people that can be accommodated or the number of bathrooms.
  • Amenities: On top of the characteristics, Airbnb contains information on a wide range of amenities for every flat. These range from the availability of Internet and a TV up to a personal doorman or a pool. We introduced dummy variables for 52 different amenities as well as a variable counting the total number of amenities.
  • Offering Characteristics: Lastly some information on the Cancellation Policy or whether the Airbnb is instantly bookable was included.

To allowe this investigation to be more focused, on its actual goal of helping students to find the right place for their desired Airbnb, some filters were applied as well. Therefore, only apartments with a private room and at least three valid ratings were included. The resulting dataset has 78 variables and 6.495 objects.

3 Mapping

Bibliography

Cox, M. (2017) Inside airbnb - adding data to the debate. [Online]. Available from: http://data.insideairbnb.com/united-kingdom/england/london/2017-03-04/data/listings.csv.gz.

Irawan, D.E. (2014) How to convert lat-long coordinates to utm. [Online]. Available from: https://rpubs.com/dasaptaerwin/19879.

Lovelace, R. & Cheshire, J. (2014) Introduction to visualising spatial data in R. National Centre for Research Methods Working Papers. [Online] 14 (03). Available from: https://github.com/Robinlovelace/Creating-maps-in-R.

Appendix

Column Numbers Name Description
1 price Price per Nighty as offered on Airbnb
2 zip_first First half of the London Zipcode
3 mean_rent Mean Rent for the given Zipcode as per SOURCE
4 distance Distance from Picadilly Circus in km
5 - 6 east & north Geographic Cartesian coordinates required for map plotting
7 - 13 review_scores Average customer reviews from Airbnb
14 number_of_reviews Number of customer reviews
15 property_type
16 room_type
17 accommodates
18 bathrooms
19 bedrooms
20 beds
21 amenities_count
22 - 74 amen Dummy Variables for the various amenities
75 minimum_nights
76 instant_bookable
77 cancellation_policy

Imperial College Business School